A model-based clustering method to detect infectious disease transmission outbreaks from sequence variation
نویسندگان
چکیده
Clustering infections by genetic similarity is a popular technique for identifying potential outbreaks of infectious disease, in part because sequences are now routinely collected for clinical management of many infections. A diverse number of nonparametric clustering methods have been developed for this purpose. These methods are generally intuitive, rapid to compute, and readily scale with large data sets. However, we have found that nonparametric clustering methods can be biased towards identifying clusters of diagnosis-where individuals are sampled sooner post-infection-rather than the clusters of rapid transmission that are meant to be potential foci for public health efforts. We develop a fundamentally new approach to genetic clustering based on fitting a Markov-modulated Poisson process (MMPP), which represents the evolution of transmission rates along the tree relating different infections. We evaluated this model-based method alongside five nonparametric clustering methods using both simulated and actual HIV sequence data sets. For simulated clusters of rapid transmission, the MMPP clustering method obtained higher mean sensitivity (85%) and specificity (91%) than the nonparametric methods. When we applied these clustering methods to published sequences from a study of HIV-1 genetic clusters in Seattle, USA, we found that the MMPP method categorized about half (46%) as many individuals to clusters compared to the other methods. Furthermore, the mean internal branch lengths that approximate transmission rates were significantly shorter in clusters extracted using MMPP, but not by other methods. We determined that the computing time for the MMPP method scaled linearly with the size of trees, requiring about 30 seconds for a tree of 1,000 tips and about 20 minutes for 50,000 tips on a single computer. This new approach to genetic clustering has significant implications for the application of pathogen sequence analysis to public health, where it is critical to robustly and accurately identify clusters for the most cost-effective deployment of outbreak management and prevention resources.
منابع مشابه
Molecular Typing of Mycobacterium Tuberculosis Isolated from Iranian Patients Using Highly Abundant Polymorphic GC-Rich-Repetitive Sequence
Background: Tuberculosis (TB) with more than 10 million new cases per year and one of the top 10 causes of death worldwide, is still one of the most important global health problems. Also, multi drug-resistant tuberculosis (MDR) is a serious danger to public health. Understanding of the epidemiological pattern of mycobacterium tuberculosis (MTB), Estimates of recent transmission and recurrence ...
متن کاملGenetic Variation Among Salvia Species Based on Sequence-Related Amplified Polymorphism (SRAP) Marker
In this study, SRAP molecular maker approach was performed to investigate genetic diversity in the Salvia genus. A total of 205 DNA bands were produced from PCR amplification of 11 Salvia species and populations using 25 selective primer combinations, of which 204 polymorphic genetic loci accounted. The total number of amplified fragments ranged from 3 to 15. The genetic similarities of 11 coll...
متن کاملThe Spatial Allocation of Hospitals With Negative Pressure Isolation Rooms in Korea: Are We Prepared for New Outbreaks?
Background Allocation of adequate healthcare facilities is one of the most important factors that public health policy-makers consider when preparing for infectious disease outbreaks. Negative pressure isolation rooms (NPIRs) are one of the critical resources for control of infectious respiratory diseases, such as the novel coronavirus disease 2019 (COVID-19) outbreak. However, there is insuffi...
متن کاملImpacts and shortcomings of genetic clustering methods for infectious disease outbreaks
For infectious diseases, a genetic cluster is a group of closely related infections that is usually interpreted as representing a recent outbreak of transmission. Genetic clustering methods are becoming increasingly popular for molecular epidemiology, especially in the context of HIV where there is now considerable interest in applying these methods to prioritize groups for public health resour...
متن کاملA Business Model to Detect Disease Outbreaks
Introduction: Every year several disease outbreaks, such as influenza-like illnesses (ILI) and other contagious illnesses, impose various costs to public and non-government agencies. Most of these expenses are due to not being ready to handle such disease outbreaks. An appropriate preparation will reduce the expenses. A system that is able to recognize these outbreaks can earn ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 13 شماره
صفحات -
تاریخ انتشار 2017